A quickly trainable hybrid SOM-based document organization system
نویسندگان
چکیده
The large volume of nowadays document collections has increased the need of fast trainable document organization systems. This paper presents and evaluates a hybrid system to self-organization of massive document collections based on self-organizing map (SOM). The hybrid system uses prototypes generated by a clustering algorithm to train the document maps, thus reducing the training time of large maps. We test the systemwith k-means and modified leader clustering algorithms. The experiments are carried out with the Reuters-21758 v1.0 and 20 Newsgroup collections. The performance of the system is measured in terms of text categorization effectiveness on test set and training time. Experimental results show that the proposed system generates effective document maps in less time than SOM. However, the hybrid system using k-means generates better document maps than the one using modified leader at the cost of more long training time. & 2008 Elsevier B.V. All rights reserved.
منابع مشابه
Content-based hierarchical document organization using multi-layer hybrid network and tree-structured features
Automatic organizing documents through a hierarchical tree is demanding in many real applications. In this work, we focus on the problem of content-based document organization through a hierarchical tree which can be viewed as a classification problem. We proposed a new document representation to enhance the classification accuracy. We developed a new hybrid neural network model to handle the n...
متن کاملA hybrid system combining self-organizing maps with case-based reasoning in wholesaler's new-release book forecasting
In this paper, we proposed a hybrid system to combine the self-organizing map (SOM) of neural network with case-based reasoning (CBR) method, for sales forecast of new released books. CBR systems have been successfully used in several domains of artificial intelligence. In order to enhance efficiency and capability of CBR systems, we connected the SOM method to deal with cluster problems of CBR...
متن کاملDeveloping A Fault Diagnosis Approach Based On Artificial Neural Network And Self Organization Map For Occurred ADSL Faults
Telecommunication companies have received a great deal of research attention, which have many advantages such as low cost, higher qualification, simple installation and maintenance, and high reliability. However, the using of technical maintenance approaches in Telecommunication companies could improve system reliability and users' satisfaction from Asymmetric digital subscriber line (ADSL) ser...
متن کاملA combination of Wilcoxon test and R-estimates for document organization and retrieval
The Wilcoxon signed-rank test is exploited for document organization and retrieval in this paper. A novel modeling method for documents and a distance metric between documents are proposed. Both document modeling and document comparisons are based on signed-ranks and are applied to the frequency of occurrence of the document bigrams. A metric using the Wilcoxon signed-rank test exploits these s...
متن کاملComputational Intelligence Methods for Clustering of Sense Tagged Nepali Documents
This paper presents a method using hybridization of self organizing map (SOM ), particle swarm optimization(PSO) and k-means clustering algorithm for document clustering. Document representation is an important step for clustering purposes. The common way of represent a text is bag of words approach. This approach is simple but has two drawbacks viz. synonymy and polysemy which arise because of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neurocomputing
دوره 71 شماره
صفحات -
تاریخ انتشار 2008